KOSAC: A Full-Fledged Korean Sentiment Analysis Corpus

نویسندگان

  • Hayeon Jang
  • Munhyong Kim
  • Hyopil Shin
چکیده

This paper aims to introduce the Korean Sentiment Analysis Corpus named KOSAC. KOSAC is a corpus consisting of 332 news articles taken from the Sejong Syntactic Parsed Corpus. These sentences have been manually-tagged for sentimental features. The corpus includes 7,713 sentence subjectivity tags and 17,615 opinionated expression tags based on the annotation scheme called KSML which reflects the characteristics of the Korean language. The results of sentence subjectivity and polarity classification experiements using the corpus show the wide possibilities of application the KSML scheme and the tagged information of the KOSAC comprehensively to other corpus. What is innovative about our work is that it pulls together both the concept of private states and nested-sources into one linguistic annotation scheme. We believe that this corpus could be used by researchers as a gold standard for various NLP tasks related to sentiment analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Annotation Scheme for Constructing Sentiment Corpus in Korean

This paper describes the first year of work constructing the Korean Sentiment Corpus, focusing on the theoretical background such as the annotation scheme. Our aim is to provide a solid theoretical background for the corpus which reflects the characteristics of the Korean language and includes approximately 8,050 sentences taken from news articles. The corpus annotation scheme, based on the MPQ...

متن کامل

Effective Use of Linguistic Features for Sentiment Analysis of Korean

In this paper, we propose a new linguistic approach for sentiment analysis of Korean. In order to overcome shortcomings of previous works confined to statistical methods, we make effective use of various linguistic features reflecting the nature of Korean such as contextual intensifiers, contextual shifters, modal affixes, and the morphological dependency chunk structures. Moreover, unlike comp...

متن کامل

Language-Specific Sentiment Analysis in Morphologically Rich Languages

In this paper, we propose languagespecific methods of sentiment analysis in morphologically rich languages. In contrast of previous works confined to statistical methods, we make use of various linguistic features effectively. In particular, we make chunk structures by using the dependence relations of morpheme sequences to restrain semantic scope of influence of opinionated terms. In conclusio...

متن کامل

Analyzing the Relationship Between Tweets, Box-Office Performance, and Stocks

Sentiment analysis, a powerful tool determining public opinion, is often applied to corpora of full length documents. Guided by recent forays into sentiment analysis on micro-blogging platforms, we will examine popular opinion of movies and stocks and compare them to real world indicators of popularity(e.g: box office.)

متن کامل

NoReC: The Norwegian Review Corpus

This paper presents the Norwegian Review Corpus (NoReC), created for training and evaluating models for document-level sentiment analysis. The full-text reviews have been collected from major Norwegian news sources and cover a range of different domains, including literature, movies, video games, restaurants, music and theater, in addition to product reviews across a range of categories. Each r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013